Search CORE

223 research outputs found

New Classification and Generative Model for Medical Visual Question Answering

Author: Ren Fuji
Zhou Yangyang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2021
Field of study

Medical images are playing an important role in the medical domain. A mature medical visual question answering system can aid diagnosis, but there is no satisfactory method to solve this comprehensive problem so far. Considering that there are many different types of questions, we propose a model called CGMVQA, including classification and answer generation capabilities to turn this complex problem into multiple simple problems in this paper. We adopt data augmentation on images and tokenization on texts. We use pre-trained ResNet152 to extract image features and add three kinds of embeddings together to deal with texts. We reduce the parameters of the multi-head self-attention transformer to cut the computational cost down. We adjust the masking and output layers to change the functions of the model. This model establishes new state-of-the-art results: 0.640 of classification accuracy, 0.659 of word matching and 0.678 of semantic similarity in ImageCLEF 2019 VQA-Med data set. It suggests that the CGMVQA is effective in medical visual question answering and can better assist doctors in clinical analysis and diagnosis

Tokushima University Institutional Repository

WRGAN : Improvement of RelGAN with Wasserstein Loss for Text Generation

Author: Jiao Ziyun
Ren Fuji
Publication venue: 'MDPI AG'
Publication date: 28/09/2021
Field of study

Generative adversarial networks (GANs) were first proposed in 2014, and have been widely used in computer vision, such as for image generation and other tasks. However, the GANs used for text generation have made slow progress. One of the reasons is that the discriminator’s guidance for the generator is too weak, which means that the generator can only get a “true or false” probability in return. Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. Differently from RelGAN, we modified the discriminator network structure with 1D convolution of multiple different kernel sizes. Correspondingly, we also changed the loss function of the network with a gradient penalty Wasserstein loss. Our experiments on multiple public datasets show that WRGAN outperforms most of the existing state-of-the-art methods, and the Bilingual Evaluation Understudy(BLEU) scores are improved with our novel method

Tokushima University Institutional Repository

Bert4CMR: Cross-Market Recommendation with Bidirectional Encoder Representations from Transformer

Author: Hu Zheng
Ren Fuji
Publication venue
Publication date: 24/05/2023
Field of study

Real-world multinational e-commerce companies, such as Amazon and eBay, serve in multiple countries and regions. Obviously, these markets have similar goods but different users. Some markets are data-scarce, while others are data-rich. In recent years, cross-market recommendation (CMR) has been proposed to enhance data-scarce markets by leveraging auxiliary information from data-rich markets. Previous works fine-tune the pre-trained model on the local market after freezing part of the parameters or introducing inter-market similarity into the local market to improve the performance of CMR. However, they generally do not consider eliminating the mutual interference between markets. Therefore, the existing methods are neither unable to learn unbiased general knowledge nor efficient transfer reusable information across markets. In this paper, we propose a novel attention-based model called Bert4CMR to simultaneously improve all markets' recommendation performance. Specifically, we employ the attention mechanism to capture user interests by modelling user behavioural sequences. We pre-train the proposed model on global data to learn the general knowledge of items. Then we fine-tune specific target markets to perform local recommendations. We propose market embedding to model the bias of each market and reduce the mutual inference between the parallel markets. Extensive experiments conducted on seven markets show that our model is state-of-the-art. Our model outperforms the suboptimal model by 4.82%, 4.73%, 7.66% and 6.49% on average of seven datasets in terms of four metrics, respectively. We conduct ablation experiments to analyse the effectiveness of the proposed components. Experimental results indicate that our model is able to learn general knowledge through global data and shield the mutual interference between markets

arXiv.org e-Print Archive

Knowledge Graph based Question and Answer System for Cosmetic Domain

Author: Ren Fuji
Xue Siyuan
Publication venue: AIA International Advanced Information Institute
Publication date: 10/08/2021
Field of study

With the development of E-commerce, the requirements of customers for products become more detailed, and the workload of customer service consultants will increase massively. However, the manufacturer is not obliged to provide specific product ingredients on the website. Therefore, it is necessary to construct a KBQA system to relieve the pressure of online customer service and effectively help customers to find suitable skincare production. For the cosmetic filed, the different basic cosmetics may have varied effects depending on its ingredients. In this paper, we utilize CosDNA website and online cosmetic websites to construct a cosmetic product knowledge graph to broaden the relationship between cosmetics, ingredients, skin type, and effects. Besides, we build the question answering system based on the cosmetic knowledge graph to allow users to understand product details directly and make the decision quickly

Tokushima University Institutional Repository

Intention Detection Based on Siamese Neural Network With Triplet Loss

Author: Ren Fuji
Xue Siyuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/10/2020
Field of study

Understanding the user's intention is an essential task for the spoken language understanding (SLU) module in the dialogue system, which further illustrates vital information for managing and generating future action and response. In this paper, we propose a triplet training framework based on the multiclass classification approach to conduct the training for the intention detection task. Precisely, we utilize a Siamese neural network architecture with metric learning to construct a robust and discriminative utterance feature embedding model. We modified the RMCNN model and fine-tuned BERT model as Siamese encoders to train utterance triplets from different semantic aspects. The triplet loss can effectively distinguish the details of two input data by learning a mapping from sequence utterances to a compact Euclidean space. After generating the mapping, the intention detection task can be easily implemented using standard techniques with pre-trained embeddings as feature vectors. Besides, we use the fusion strategy to enhance utterance feature representation in the downstream of intention detection task. We conduct experiments on several benchmark datasets of intention detection task: Snips dataset, ATIS dataset, Facebook multilingual task-oriented datasets, Daily Dialogue dataset, and MRDA dataset. The results illustrate that the proposed method can effectively improve the recognition performance of these datasets and achieves new state-of-the-art results on single-turn task-oriented datasets (Snips dataset, Facebook dataset), and a multi-turn dataset (Daily Dialogue dataset)

Tokushima University Institutional Repository

Emotion Expression Extraction Method for Chinese Microblog Sentences

Author: Ren Fuji
Zhang Qian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2021
Field of study

With the rapid spread of Chinese microblog, a large number of microblog topics are being generated in real-time. More and more users pay attention to emotion expressions of these opinionated sentences in different topics. It is challenging to label the emotion expressions of opinionated sentences manually. For this endeavor, an emotion expression extraction method is proposed to process millions of user-generated opinionated sentences automatically in this paper. Specifically, the proposed method mainly contains two tasks: emotion classification and opinion target extraction. We first use a lexicon-based emotion classification method to compute different emotion values in emotion label vectors of opinionated sentences. Then emotion label vectors of opinionated sentences are revised by an unsupervised emotion label propagation algorithm. After extracting candidate opinion targets of opinionated sentences, the opinion target extraction task is performed on a random walk-based ranking algorithm, which considers the connection between candidate opinion targets and the textual similarity between opinionated sentences, ranks candidate opinion targets of opinionated sentences. Experimental results demonstrate the effectiveness of algorithms in the proposed method

Tokushima University Institutional Repository

Data-Driven Channel Pruning towards Local Binary Convolution Inverse Bottleneck Network Based on Squeeze-and-Excitation Optimization Weights

Author: Feng Duo
Ren Fuji
Publication venue: 'MDPI AG'
Publication date: 19/12/2021
Field of study

This paper proposed a model pruning method based on local binary convolution (LBC) and squeeze-and-excitation (SE) optimization weights. We first proposed an efficient deep separation convolution model based on the LBC kernel. By expanding the number of LBC kernels in the model, we have trained a larger model with better results, but more parameters and slower calculation speed. Then, we extract the SE optimization weight value of each SE module according to the data samples and score the LBC kernel accordingly. Based on the score of each LBC kernel corresponding to the convolution channel, we performed channel-based model pruning, which greatly reduced the number of model parameters and accelerated the calculation speed. The model pruning method proposed in this paper is verified in the image classification database. Experiments show that, in the model using the LBC kernel, as the number of LBC kernels increases, the recognition accuracy will increase. At the same time, the experiment also proved that the recognition accuracy is maintained at a similar level in the small parameter model after channel-based model pruning by the SE optimization weight value

Tokushima University Institutional Repository

Hierarchical Network with Label Embedding for Contextual Emotion Recognition

Author: Deng Jiawen
Ren Fuji
Publication venue: Science and Technology Review Publishing House|American Association for the Advancement of Science|China Association for Science and Technology
Publication date: 12/07/2021
Field of study

Emotion recognition has been used widely in various applications such as mental health monitoring and emotional management. Usually, emotion recognition is regarded as a text classification task. Emotion recognition is a more complex problem, and the relations of emotions expressed in a text are nonnegligible. In this paper, a hierarchical model with label embedding is proposed for contextual emotion recognition. Especially, a hierarchical model is utilized to learn the emotional representation of a given sentence based on its contextual information. To give emotion correlation-based recognition, a label embedding matrix is trained by joint learning, which contributes to the final prediction. Comparison experiments are conducted on Chinese emotional corpus RenCECps, and the experimental results indicate that our approach has a satisfying performance in textual emotion recognition task

Tokushima University Institutional Repository